Large Margin Algorithms for Discriminative Continuous Speech Recognition
نویسندگان
چکیده
Automatic speech recognition has long been a considered dream. While ASR does work today, and it is commercially available, it is extremely sensitive to noise, talker variations, and environments. The current state-of-the-art automatic speech recognizers are based on generative models that capture some temporal dependencies such as hidden Markov models (HMMs). While HMMs have been immensely important in the development of large-scale speech processing applications and in particular speech recognition, their performance is far from the performance of a human listener. HMMs have several drawbacks, both in modeling the speech signal and as learning algorithms. The present dissertation develops fundamental algorithms for continuous speech recognition, which are not based on the HMMs. These algorithms are based on latest advances in large margin and kernel methods, and they aim at minimizing the error induced by the speech recognition problem. Chapter 1 consists of a basic introduction of the current state of automatic speech recognition with the HMM and its limitations. We also present the advantages of the large margin and kernel methods and give a short outline of the thesis. In Chapter 2 we present large-margin algorithms for the task of hierarchical phoneme classification. Phonetic theory of spoken speech embeds the set of phonemes of western languages in a phonetic hierarchy where the phonemes constitute the leaves of the tree, while broad phonetic groups — such as vowels and consonants — correspond to internal vertices. Motivated by this phonetic structure, we propose a hierarchical model that incorporates the notion of the similarity between the phonemes and between phonetic groups. As in large margin methods, we associate a vector in a high dimensional space with each phoneme or
منابع مشابه
Large Margin Training of Acoustic Models for Speech Recognition
LARGE MARGIN TRAINING OF ACOUSTIC MODELS FOR SPEECH RECOGNITION Fei Sha Advisor: Prof. Lawrence K. Saul Automatic speech recognition (ASR) depends critically on building acoustic models for linguistic units. These acoustic models usually take the form of continuous-density hidden Markov models (CD-HMMs), whose parameters are obtained by maximum likelihood estimation. Recently, however, there ha...
متن کاملStructured Support Vector Machines for Speech Recognition
Discriminative training criteria and discriminative models are two eective improvements for HMM-based speech recognition. is thesis proposed a structured support vector machine (SSVM) framework suitable for medium to large vocabulary continuous speech recognition. An important aspect of structured SVMs is the form of features. Several previously proposed features in the eld are summarized in ...
متن کاملLarge Margin Training of Continuous Density Hidden Markov Models
Continuous density hidden Markov models (CD-HMMs) are an essential component of modern systems for automatic speech recognition (ASR). These models assign probabilities to the sequences of acoustic feature vectors extracted by signal processing of speech waveforms. In this chapter, we investigate a new framework for parameter estimation in CD-HMMs. Our framework is inspired by recent parallel t...
متن کاملA log-linear discriminative modeling framework for speech recognition
Conventional speech recognition systems are based on Gaussian hidden Markov models (HMMs). Discriminative techniques such as log-linear modeling have been investigated in speech recognition only recently. This thesis establishes a log-linear modeling framework in the context of discriminative training criteria, with examples from continuous speech recognition, part-of-speech tagging, and handwr...
متن کاملLarge-margin minimum classification error training: A theoretical risk minimization perspective
Large-margin discriminative training of hidden Markov models has received significant attention recently. A natural and interesting question is whether the existing discriminative training algorithms can be extended directly to embed the concept of margin. In this paper, we give this question an affirmative answer by showing that the sigmoid bias in the conventional minimum classification error...
متن کامل